ChemSpot: a hybrid system for chemical named entity recognition

نویسندگان

  • Tim Rocktäschel
  • Michael Weidlich
  • Ulf Leser
چکیده

MOTIVATION The accurate identification of chemicals in text is important for many applications, including computer-assisted reconstruction of metabolic networks or retrieval of information about substances in drug development. But due to the diversity of naming conventions and traditions for such molecules, this task is highly complex and should be supported by computational tools. RESULTS We present ChemSpot, a named entity recognition (NER) tool for identifying mentions of chemicals in natural language texts, including trivial names, drugs, abbreviations, molecular formulas and International Union of Pure and Applied Chemistry entities. Since the different classes of relevant entities have rather different naming characteristics, ChemSpot uses a hybrid approach combining a Conditional Random Field with a dictionary. It achieves an F(1) measure of 68.1% on the SCAI corpus, outperforming the only other freely available chemical NER tool, OSCAR4, by 10.8 percentage points. AVAILABILITY ChemSpot is freely available at: http://www.informatik.hu-berlin.de/wbi/resources.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ensemble Approach to Extract Chemical Named Entity by Using Results of Multiple CNER Systems with Different Characteristic

We propose a novel ensemble approach chemical named entity recognition (CNER) tool that uses different CNER tools such as OSCAR4 and ChemSpot with different characteristics by using machine learning (ML) technique. Since this tool may identify typical errors of one CNER by using other tools’ output, our system outperforms ChemSpot (ML-based) and OSCAR4 (rule-based) in original setting.

متن کامل

A Web Prototype for Detecting Chemical Compounds and Drugs

This paper introduces a web prototype for named entity recognition of chemical compounds and drugs. The tool is based on a system developed to participate in the ChemDNER task organized as part of Biocreative 2013 workshop. The system combines the ChemSpot tool as well as a set of semanticbased rules, which were defined according to the guidelines provided to task participants. The prototype is...

متن کامل

Extended Feature Set for Chemical Named Entity Recognition and Indexing

The BioCreative IV CHEMDNER Task provides participants with the opportunity to compare their methods for chemical named entity recognition (NER) and indexing in a controlled environment. We contributed to this task with our previous conditional random field based system [1] extended by a number of novel general and domain-specific features. For the latter, we used features derived from two exis...

متن کامل

A Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features

Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...

متن کامل

Improvement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination

Chemical Named Entity Recognition (NER) is the basic step for consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, extraction of the names of the molecules and their properties. Improvement in the performance of such systems may affects the quality of the subsequent tasks. Chemical text from which data for named entity recognition is extracte...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 28 12  شماره 

صفحات  -

تاریخ انتشار 2012